Text mixing shapes the anatomy of rank-frequency distributions.
نویسندگان
چکیده
Natural languages are full of rules and exceptions. One of the most famous quantitative rules is Zipf's law, which states that the frequency of occurrence of a word is approximately inversely proportional to its rank. Though this "law" of ranks has been found to hold across disparate texts and forms of data, analyses of increasingly large corpora since the late 1990s have revealed the existence of two scaling regimes. These regimes have thus far been explained by a hypothesis suggesting a separability of languages into core and noncore lexica. Here we present and defend an alternative hypothesis that the two scaling regimes result from the act of aggregating texts. We observe that text mixing leads to an effective decay of word introduction, which we show provides accurate predictions of the location and severity of breaks in scaling. Upon examining large corpora from 10 languages in the Project Gutenberg eBooks collection, we find emphatic empirical support for the universality of our claim.
منابع مشابه
Text mixing shapes the anatomy of rank-frequency distributions: A modern Zipfian mechanics for natural language
Natural languages are full of rules and exceptions. One of the most famous quantitative rules is Zipf’s law which states that the frequency of occurrence of a word is approximately inversely proportional to its rank. Though this ‘law’ of ranks has been found to hold across disparate texts and forms of data, analyses of increasingly large corpora over the last 15 years have revealed the existenc...
متن کاملDesign and characterization of biodegradable polymer-clay nanocomposites prepared by solution mixing technique
This paper discusses about preparation of biodegradable polymer /clay nanocomposites based on organically modified montmorillonite clay; i.e. cloisite 10A and biodegradable polymer chitosan by solution mixing technique and their characterization. The nanocomposites were successfully prepared and their structures were characterized by powder x-ray diffraction (XRD), particle size analyzer (Beckm...
متن کاملDesign and characterization of biodegradable polymer-clay nanocomposites prepared by solution mixing technique
This paper discusses about preparation of biodegradable polymer /clay nanocomposites based on organically modified montmorillonite clay; i.e. cloisite 10A and biodegradable polymer chitosan by solution mixing technique and their characterization. The nanocomposites were successfully prepared and their structures were characterized by powder x-ray diffraction (XRD), particle size analyzer (Beckm...
متن کاملA Comparing between the impacts of text based indexing and folksonomy on ranking of images search via Google search engine
Background and Aim: The purpose of this study was to compare the impact of text based indexing and folksonomy in image retrieval via Google search engine. Methods: This study used experimental method. The sample is 30 images extracted from the book “Gray anatomy”. The research was carried out in 4 stages; in the first stage, images were uploaded to an “Instagram” account so the images are tagge...
متن کاملنمودار شوهارت ناپارامتری رتبه علامت دار با فاصله نمونه گیری متغیر
Nonparametric control chart based on rank is used for detecting changes in median(mean). In this article ,Signed-rank control chart is considered with variable sampling interval. We compared the performance of Signed-rank with variable sampling interval (VSI-SR) to Signed-rank with Fixed Sampling interval (FSI-SR),the numerical results demonstrated the VSI feature is so useful. Bakir[1] showed ...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
- Physical review. E, Statistical, nonlinear, and soft matter physics
دوره 91 5 شماره
صفحات -
تاریخ انتشار 2015